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Abstract 

In this paper, we consider a generalized longest common subsequence problem with multiple substring 
exclusion constrains. For the two input sequences X and Y of lengths n and m, and a set of d constrains 
P = {Pi , ■ • • , Pd} of total length r, the problem is to find a common subsequence Z of X and Y excluding 
each of constrain string in P as a substring and the length of Z is maximized. The problem was declared 
to be NP-hard[T], but we finally found that this is not true. A new dynamic programming solution 
for this problem is presented in this paper. The correctness of the new algorithm is proved. The time 
complexity of our algorithm is 0(nmr). 

1 Introduction 

In this paper, we consider a generalized longest common subsequence problem with multiple substring 
exclusion constrains. The longest common subsequence (LCS) problem is a well-known measurement for 
computing the similarity of two strings. It can be widely applied in diverse areas, such as file comparison, 
pattern matching and computational biology [SI SI [HI E] • 

Given two sequences X and Y, the longest common subsequence (LCS) problem is to find a subsequence 
of X and Y whose length is the longest among all common subsequences of the two given sequences. 

For some biological applications some constraints must be applied to the LCS problem. These kinds of 
variant of the LCS problem are called the constrained LCS (CLCS) problem. Recently, Chen and ChaopQ 
proposed the more generalized forms of the CLCS problem, the generalized constrained longest common 
subsequence (GC-LCS) problem. For the two input sequences X and Y of lengths n and m, respectively, and 
a constraint string P of length r, the GC-LCS problem is a set of four problems which are to find the LCS of 
X and Y including/excluding P as a subsequence/substring, respectively. The four generalized constrained 
LCS can be summarized in Table 1. 



Table 1: The GC-LCS problems 



Problem 


Input 


Output 


SEQ-IC-LCS 


X,Y, and P 


The longest common subsequence of X and Y 
including P as a subsequence 


STR-IC-LCS 


X,Y, and P 


The longest common subsequence of X and Y 
including P as a substring 


SEQ-EC-LCS 


X,Y, and P 


The longest common subsequence of X and Y 
excluding P as a subsequence 


STR-EC-LCS 


X,Y, and P 


The longest common subsequence of X and Y 
excluding P as a substring 
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The four GC-LCS problems can be generalized further to the cases of multiple constrains. In these gen- 
eralized cases, the single constrained pattern P will be generalized to a set of d constrains P = {Pi, • • • , Pd] 
of total length r, as shown in Table 2. 



Table 2: The Multiple-GC-LCS problems 



Problem 


Input 


Output 


M-SEQ-IC-LCS 


X,Y, and a set of constrains 
P = {P lr --,P d } 


The longest common subsequence of X and Y 
including each of constrain Pi G P as a subsequence 


M-STR-IC-LCS 


X,Y, and a set of constrains 
P = {P 1 ,-->,P d } 


The longest common subsequence of X and Y 
including each of constrain Pi G P as a substring 


M-SEQ-EC-LCS 


X,Y, and a set of constrains 
P = {P u ---,P d } 


The longest common subsequence of X and Y 
excluding each of constrain Pi G P as a subsequence 


M-STR-EC-LCS 


X,Y, and a set of constrains 
P = {P u ---,P d } 


The longest common subsequence of X and Y 
excluding each of constrain G P as a substring 



The Multiple-GC-LCS problem M-SEQ-IC-LCS has been proved to be NP-hard in [5]. The Multiple-GC- 
LCS problem M-SEQ-EC-LCS has also been proved to be NP-hard in 002]. In addition, The Multiple-GC- 
LCS problems M-STR-IC-LCS and M-STR-EC-LCS were also declared to be NP-hard in [I], but without 
strict proofs. The exponential-time algorithms for solving these two problems were also presented in [T]. 

We will discuss the multiple STR-EC-LCS problem M-STR-EC-LCS in this paper. A cubic time algorithm 
is presented for the M-STR-EC-LCS problem and disproves that this problem is NP-hard. 

The organization of the paper is as follows. 

In the following 4 sections we describe our presented dynamic programming algorithm for the M-STR- 
EC-LCS problem. 

In section 2 the preliminary knowledge for presenting our algorithm for the M-STR-EC-LCS problem is 
discussed. In section 3 we give a new dynamic programming solution for the M-STR-EC-LCS problem with 
time complexity O(nmr), where n and m are the lengths of the two given input strings, and r is the total 
length of d constrain strings. In section 4 we discuss the issues to implement the algorithm efficiently. Some 
concluding remarks are in section 5. 

2 Preliminaries 

A sequence is a string of characters over an alphabet J^. A subsequence of a sequence X is obtained by 
deleting zero or more characters from X (not necessarily contiguous). A substring of a sequence A is a 
subsequence of successive characters within X. 

For a given sequence X — x\Xi ■ ■ ■ x n of length n, the ith character of X is denoted as Xi G f° r 
any i = 1, • • • , n. A substring of X from position i to j can be denoted as X[i : j] If 
i =/= 1 or j =/= n, then the substring X[i : j] — XiXi+i ■ ■ ■ x j is called a proper substring of X. A substring 
X[i : j] — XiXi + \ ■ ■ ■ Xj is called a prefix or a suffix of X if i — 1 or j = n, respectively. 

For the two input sequences X = X1X2 ■ ■ ■ x n and Y = y\yi ■ ■ ■ y m of lengths n and m, respectively, and a 
set of d constrains P = {P u ■ ■ ■ ,P d } of total length r, the multiple STR-EC-LCS problem M-STR-EC-LCS 
is to find an LCS of X and Y excluding each of constrain Pi G P as a substring. 

The most important difference between the problems STR-EC-LCS and M-STR-EC-LCS is the number 
of constrains. For ease of discussion, we will make the following two assumptions on the constrain set P. 

Assumption 1 There are no duplicated strings in the constrain set P. 

Assumption 2 No string in the constrain set P is a proper substring of any other string in P. 

Keyword tree [21 [7] is a main data structure in our dynamic programming algorithm to process the 
constrain set P of the M-STR-EC-LCS problem. 
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Figure 1: Keyword Trees 



i 


1 


2 


3 


Pre(i) 





1 






Deflniton 1 XTie Keyword tree for set P is a rooted directed tree T satisfying 3 conditions: 1. each edge is 
labeled with exactly one character; 2. any two edges out of the same node have distinct labels; and 3. every 
string Pi in P maps to some node vofT such that the characters on the path from the root ofT to v exactly 
spell out Pi, and every leaf of T is mapped to by some string in P. 

For example, Figure 1(a) shows the keyword tree T for the constrain set P = {aab,aba,ba}, where 
q = 3, r = 8. Clearly, every node in the keyword tree corresponds to a prefix of one of the strings in set P, 
and every prefix of a string Pj in P maps to a distinct node in the keyword tree T. The keyword tree for set 
P of total length r of all strings can be easily constructed in 0(r) time for a constant alphabet size. Because 
no two edges out of any node of T are labeled with the same character, the keyword tree T can be used to 
search for all occurrences in a text X of strings from P. 

The failure functions in the Knuth-Morris-Pratt algorithm for solving the string matching problem can 
be generalized to the case of keyword tree to speedup the exact string matching of multiple patterns as 
follows. 

In order to identify the states of the nodes of T, we assign numbers 0, 1, • • • , t — 1 to all t nodes of T in 
their preorder numbering. Then, each node will be assigned an integer i,0 < i < t, as shown in Fig.l. In 
the following, we also use the node number as its state number of the node in T. 

For each node numbered i of a keyword tree T, the concatenation of characters on the path from the 
root to the node i spells out a string denoted as L(i). The string L{i) is also called the label of node i in 
the keyword tree T . For any node i of T, define lp(i) to be the length of the longest proper suffix of string 
L(i) that is a prefix of some string in T. 

It can be verified readily that for each node i of T, if A is an lp(i)-\ength suffix of string L(i), then there 
must be a unique node pre{i) in T such that L(pre(i)) = A. If lp(i) = then pre(i) = is the root of T. 

Definiton 2 The ordered pair (i,pre(i)) is called a failure link. 

The failure link is a direct generalization of the failure functions in the KMP algorithm. For example, in 
Figure 1(a), failure links are shown as pointers from every node i to node pre{i) where lp(i) > 0. The other 
failure links point to the root and are not shown. 

The failure links of T define actually a failure function pre for the constrain set P. 

For example, for the nodes i = 1,2,3,4,5,6,7 in Fig.l, the corresponding values of failure function are 
pre(i) = 0, 1, 4, 6, 7, 0, 1, as shown in Fig.l. 

The failure function pre is used to speedup the search for all occurrences in a text X of strings from P. 
As stated in [7], the failure function pre can be computed in 0(r) time. 
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In the keyword tree application in our dynamic programming algorithm, a function a will be mentioned 
frequently. For a string S and a given keyword tree T, if the label L(i) of a node numbered i is also a suffix 
of S, then the node i is called a suffix node of S in T. 

Definiton 3 For any string S and a given keyword tree T , the unique suffix node of S in T with maximum 
depth is denoted as cr(S). That is: 

\L(a(S))\ = max.{\L(i)\\L(i) is a suffix of S} (1) 

0<i<t 

For example, if S = aabaaabb, then in the keyword tree T of Fig.l, the node 6 is the only suffix node of 
S in T, therefore a(S) = 6. 

In our keyword tree application, we are only interested in the nonleaf nodes of the tree. So, we can 
renumber the nodes of the tree only for nonleaf nodes, omitting the leaf nodes of the tree, as shown in 
Fig. 1(b). After renumbering, the failure function of the tree will also be changed accordingly. 

If a string Pj in the constrain set P is a proper substring of another string Pj in P, then an LCS of X and 
Y excluding Pj must also exclude Pj . For this reason, the constrain string Pj can be removed from constrain 
set P without changing the solution of the problem. For example, the string ba is a proper substring of the 
string aba in the keyword tree of Fig. 1(a). Therefore, the string aba can be removed from the keyword tree, 
as shown in Fig. 1(c). We will show shortly how to remove these redundant strings from constrain set P in 
0(r) time. In the following sections, discussions are based on the Assumption 1 and 2 on the constrain set 
P. The number of nonleaf nodes of the keyword tree for the constrain set P is denoted as s. In the worst 
case s = r — d. The root of the keyword tree is numbered 0, and the other nonleaf nodes are numbered 
1, 2, • • • , s — 1 in their preorder numbering. For example, in Fig. 1(c), there are s = 3 nonleaf nodes in T. 
The labels for the three nonleaf nodes are L(0) — 0, L(i) = a, L(2) = aa and L(3) = b respectively. 

The symbol © is also used to denote the string concatenation. For example, if S\ = aaa and S2 = bbb, 
then it is readily seen that Si © S2 = aaabbb. 



3 Our Main Result: A Dynamic Programming Algorithm 

In the following discussions, we will call 'a sequence excluding each of constrain string in P as a substring' 
a sequence excluding P for short. 

Definiton 4 Let Z(i,j, k) denote the set of all LCSs of X[l : i] and Y[\ : j] excluding P and cr(z) = k for 
each z € Z(i,j, k), where 1 < i < n, 1 < j < m, and < k < s. The length of an LCS in Z(i,j, k) is denoted 
as f(i,j,k). 

If we can compute f(i,j, k) for any 1 < i < n, 1 < j < m, and < k < s efficiently, then the length of an 
LCS of X and Y excluding P must be max {f(n,m,k)}. 

0<k<s 

By using the keyword tree data structure described in the last section, we can give a recursive formula 
for computing f(i,j,k) by the following Theorem. 

Theorem 1 For the two input sequences X — x\Xi ■ ■ ■ x n and Y = y\y2 ■ ■ ■ y m of lengths n and m, respec- 
tively, and a set of d constrains P — {Pi, • • • , Pz} of total length r, let Z(i,j, k) and f(i,j, k) be defined as 
Definition^ Suppose a keyword tree T for the constrain set P have been built, and the s nonleaf nodes ofT 
are numbered in their preorder numbering. The label of the node numbered k(0 < k < s) is denoted as L(k). 
Then, for any 1 < i < n, 1 < j < m, and < k < s, f(i,j,k) can be computed by the following recursive 
formula Q). 

nax{/(i- l,j,k),f(i,j - l,k)} if Xi^yj, 

f(hJ,k) = { max | /(i _ 1)J -_ 1)fc))1+ max {f(i-l,j-l, q )\ a (L(q)®Xi) = k}\ ifXi=y s . (2) 

I 0<q<s J 
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The boundary conditions of this recursive formula are /(i,0, fc) — f(0,j,k) = for any < i < n, < 
j < m, and < k < s. 

Proof. 

For any 1 < i < n, 1 < j < m, and < fc < s, suppose f(i,j, k) = t and 2 = Zi, • • • , z t € Z(i,j, k). 

First of all, we notice that for each pair (i' , j'), 1 < «' < n, 1 < j' < m,such that i' < i and j' < j, we have 
f(i',j',k) < f(i,j,k), since a common subsequence z of X[l : i'] and Y[l : j'] excluding P and a(z) ~ k is 
also a common subsequence of X[l : i] and Y[l : j] excluding P and a(z) = k. 

(1) In the case of Xi ^ yj, we have Xi ^ z t or yj ^ z t . 

(1.1) If xi ^ zt, then z = Z\, ■ ■ ■ , Zt is a common subsequence of X[l : i — 1] and Y[l : j] excluding P and 
er(zi, • • • , z t ) — fc, and so /(z — 1, j, fc) > t. On the other hand, f{i — 1, j, fc) < f(i,j, fc) = i. Therefore, in 
this case we have f{i,j 1 fc) = f(i — 1, j, fc). 

(1.2) If j/j 7^ z t , then we can prove similarly that in this case, f(i,j, fc) = f(i,j — 1, fc)- 
Combining the two subcases we conclude that in the case of Xi ^ yj, we have 

f(i,j,k) = max{/(i - l,j,k)J(i,j - l,fc)} 

(2) In the case of Xi — yj 7 there are also two cases to be distinguished. 

(2.1) If Xi = yj 7^ Zt, then z = z%, ■ ■ ■ , Zt is also a common subsequence of X[l : i — 1] and Y[l : j — 1] 
excluding P and er(zi, • • • , z t ) — fc, and so f(i — 1, j — 1, fc) > t. On the other hand, f(i — l,j — 1, fc) < 
f(i, j, fc) = i. Therefore, in this case we have f(i, j, fc) = f(i — 1, j — 1, fc). 

(2.2) If Xi = yj = z t , then f(i,j, k) = t > and z = Zi, • • • ,Zt is an LCS of X[l : i) and Y[l : j] excluding 
P and o~(zi, • • • , z t ) = fc, and thus z 1? • • • , z t _! is a common subsequence of X[l : i — 1] and Y[l : j — 1] 
excluding P. 

Let cr(zi, • ■ ■ , z t _i) = q and f(i — l,j — l, q) = h. Then zi, • • • , Zt-i is a common subsequence of X[l : 
and Y[l : j — 1] excluding P and cr(zi, • • • , z t _i) = g. Therefore, we have 

f(i-l,j-l,q) = h>t-l. (3) 

Let v = vi, ■ ■ ■ , Vh G Z(i — l,j — 1, q) is an LCS of JT[1 : and Y[l : j — 1] excluding P and 

cr(u!, • • • ,Vh) — q. Then a((vi, ■ ■ ■ ,u/j) © x^) = a(L(q) © x,) = fc, and thus (wi, • • • , v^) © Xi is a common 
subsequence of X[l : i] and Y [1 : j] excluding P and cr((i>i, ■ • • , Vh) ® Xi) = k. 

Therefore, 

f(i,j,k) = t>h+l. (4) 

Combining ([3| and Q we have h = t — 1. Therefore, z\ , • • • , z t _i is an LCS of X[l : i—1] and Y [1 : j — 1] 
excluding P and a(z%, • • • , z t _i) = q. 
In other words, 

k)<l + max {/(t - 1, j - l,q)\a(L(q) © i,) = fc} (5) 

0<q<s 

On the other hand, for any < q < s, if f(i — l,j — 1, q) = h and a(L(q) © x^) = fc, then for any 
v = Vi, ■ ■ ■ ,Vh G ^(i — l,j — 1, q), v © Xi is a common subsequence of : i] and Y[l : j) and a(v © Xi) = fc. 
Since v excludes P and a(v © Xi) = fc < s, v © Xi is a common subsequence of X[l : i] and Y[l : j] excluding 
P. Furthermore, v © Xi is a common subsequence of X[l : i] and Y[l : j] excluding P and cr(i> © Xi) = fc. 
Therefore, f(i,j, k)=t>l + h = l + f(i — 1, j — 1, q), and so we conclude that, 

f(i,j,k)> 1+ max {/(» - l,j - 1, ?)KL(g) ® x«) = fc} (6) 

0<q<s 

Combining ([5| and ^ we have, in this case, 

f(i,j,k) = l+ max {f(i-l,j-l,q)\a{L(q)® Xi ) = k} (7) 

0<g<s 

Combining the two subcases in the case of x 4 = yj, we conclude that the recursive formula ^ is correct 
for the case x^ = yj. 

The proof is complete. ■ 
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4 The Implementation of the Algorithm 



According to Theorem [I] our algorithm for computing f(i,j,k) is a standard 2-dimensional dynamic pro- 
gramming algorithm. By the recursive formula (|2j), the dynamic programming algorithm for computing 
f(i,j, k) can be implemented as the following Algorithm 1. 

In Algorithm 1, s is the number of nonleaf nodes of the keyword tree T for set P. The root of the keyword 
tree is numbered 0, and the other nonleaf nodes are numbered 1, 2, • • • , s — 1 in their preorder numbering. 
L{t) is the label of node numbered t in the keyword tree T. 



Algorithm 1 M-STR-EC-LCS 

Input: Strings X = x\ ■ ■ ■ x n , Y = y\ ■ ■ ■ y m of lengths n and ra, respectively, and a set of d constrains 
P = {Pi, • • • , P d } of total length r 

Output: The length of an LCS of X and Y excluding P 



l: Build a keyword tree T for P 

2: for all k , < i < n, < j < m, and < fc < s do 

3: f(i, 0, k) 0, f(0,j, k) {boundary condition} 

4: end for 

5: for i = 1 to n do 

6: for j = 1 to m do 

7: for fc = to s do 

8: if Xi 7^ yj then 

9: f{ij,k) <- max{/(i - l,j,k)J(i,j - l,fc)} 

10: else 

11: u <- max {/(i - 1, j - l,t)|(r(L(t) © Xi) = fe} 

0<i<s 

12: /(«, fc) max{/(i — — l,k), 1 + u} 

13: end if 

14: end for 

15: end for 

16: end for 

17: return max {fin, m, t)} 



To implement our algorithm efficiently the most important thing is to compte cr(Z/(/c) © Xi) for each 
< k < s and Xi, 1 < i < n, in line 11 efficiently. 

It is obvious that a(L(fc) © xi) — g if there is an edge (fc, 5) out of the node k labeled Xi. It will be more 
complex to compute a(L(k) © Xi) if there is no edge out of the node k labeled Xi. In this case the matched 
node label has to be changed to the longest proper suffix of L(k) that is a prefix of some string in T and the 
corresponding node h has an out edge {h,g) labeled xi. Therefore, in this case, cr(L(fc) © x^) = g. 



Algorithm 2 cr(k,ch) 
Input: Integer k and character ch 
Output: a{L(k) © ch) 
l: while k > do 



2: if there is an edge (k, h) labeled ch out of the node k of T then 

3: return h 

4: else 

5: k -s— pre(fc) 

6: end if 



7: end while 
8: return 



G 



This computation is very similar to the search algorithm in the keyword tree T for the multiple string 
matching problem [2 [7]. 

With pre-computed prefix function pre, the function a(L(k)®ch) for each character ch and 1 < k < s 
can be described as follows. 

Then, we can compute an index t* such that 

f(i -l,j-l,t*)= max {f(i M)KL(t) Xi ) = k} 

0<t<s 

in line 11 of Algorithm 1 by the following Algorithm 3. 



Algorithm 3 maxcr(z, j, k) 
Input: Integers i,j,k 

Output: An index t* such that f(i — 1, j — l,t*) = max {f(i — 1, j — 1, t)\a(L(t) © x { ) = k} 

0<t<s 

1: tmp i 1, t* < 1 

2: for t = to s — 1 do 

3: if a(t, Xi) = k and f(i — l,j — 1, t) > tmp then 

4: tmp «— f(i — 1, j — l,t),t* <- t 

5: end if 

6: end for 

7: return t* 



Then the value of u in line 11 of Algorithm 1 must be 

u = f(i - l,j - l,t*) = f(i - l,j - l,maxer(i, j, k)). 
We can improve the efficiency of above algorithms further in following two points. 

First, we can pre-compute a table A of the function a(L(k)(Bch) for each character ch an< ^ 1 < k < s 
to speed up the computation of maxer(z, j, k). When we per-compute the prefix function pre, for every edge 
(k,g) labeled with character ch, the value of A(fc, ch) can be assigned directly to g. The other values of the 
table A can be computed by using the prefix function pre in the following recursive algorithm. 



Algorithm 4 X(k, ch) 

Input: Integer k, character ch 
Output: Value of \(k,ch) 
1: if k > and X(k, ch) = then 
2: X(k,ch) <r- X(pre(k),ch) 
3: end if 

4: return X(k,ch) 



The time cost of computing all values A(fc, ch) of the table for each character c/i £ ^ and 1 < k < s 
by above preprocessing algorithm is obviously (9(s|S|). By using this pre-computed table A, the value of 
function a(L(k) © ch) for each character ch £ ^ and 1 < k < s can be computed readily in O(l) time. 

Second, the computation of function maxcr(i,j, k) is very time consuming and many repeated compu- 
tations are overlapped in the whole for loop of the Algorithm 1. We can amortized the computation of 
function maxcr(i, j, k) to each entry of f(i,j, k) in the for loop on variable k of the Algorithm 1 and finally 
reduce the time costs of the whole algorithm. The modified algorithm can be described as follows. 

Since X(k,Xi) can be computed in O(l) time for each a;,, 1 < i < n and any < k < s, the loop body 
of above algorithm requires only O(l) time. Therefore, our dynamic programming algorithm for computing 
the length of an LCS of X and Y excluding P requires 0{nmr) time and 0(r|S|) preprocessing time. 
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Algorithm 5 M-STR-EC-LCS 

Input: Strings X = x\ ■ ■ ■ x n , Y = y\ ■ ■ ■ y rn of lengths n and m, respectively, and a set of d constrains 
P = {Pi, ■ ■ ■ , P d } of total length r 

Output: The length of an LCS of X and Y excluding P 
l: Build a keyword tree T for P 

2: for all i,j,k , < i < n, < j < m, and < k < s do 

3: f(i, 0, k) <— 0, /(0, j, fc) {boundary condition} 

4: end for 

5: for i = 1 to n do 

6: for j = 1 to m do 

7: for fc = to s do 

8: f(i,j,k) <- max{/(i- l,j,k)J(i,j - l,fc)} 

9: end for 
10: if axj = j/j then 
11: for fe = to s do 

12: t A(fc, 

13: f(i, j, t) <- max{/(«, j, t), 1 + /(i - 1, j - 1, fe)} 

14: end for 

15: end if 
16: end for 
17: end for 

18: return max {/(n, m, t)} 

0<t<s 



Until now we have assumed that our algorithm is implemented under Assumption 1 and Assumption 2 
on the constrain set P. We now describe how to relax the two assumptions. 

If Assumption 1 is violated, then there must be some duplicated strings in the constrain set P. In this 
case, we can fist sort the strings in the constrain set P, then duplicated strings can be removed from P easily 
and then Assumption 1 on the constrain set P is satisfied. It is clear that removed strings will not change 
the solution of the problem. 

For Assumption 2, we first notice that a string A in the constrain set P is a proper substring of string 
B in P, if and only if in the keyword tree T of P, there is a directed path of failure links from a node v on 
the path from the root to the leaf node corresponding to string B to the leaf node corresponding to string 
A [7]. For example, in Fig. 1(a), there is a directed path of failure links from node 5 to node 7 and thus we 
know the string ba corresponding to node 7 is a proper substring of string aba corresponding to node 5. 

With this fact, if Assumption 2 is violated, we can remove all super-strings from the constrain set P as 
follows. We first build a keyword tree T for the constrain set P, then mark all nodes passed by a directed 
path of failure links to a leaf node in T by using a dept first traversal of T. All the strings corresponding to 
the marked leaf node can then be removed from P. Assumption 2 is now satisfied on the new constrain set 
and the keyword tree T for the new constrain set is then rebuilt. It is not difficult to do this preprocessing 
in 0(r) time. It is clear that the removed super-strings will not change the solution of the problem. 

If we want to get the answer LCS of X and Y excluding P, but not just its length, we can also present 
a simple recursive back tracing algorithm for this purpose as the following Algorithm 6. 

In the end of our new algorithm, we will find an index t such that f(n,m,t) gives the length of an LCS 
of X and Y excluding P. Then, a function call back(n,m,t) will produce the answer LCS accordingly. 

Since the cost of the algorithm maxtr(z, j, k)) is 0(r) in the worst case, the algorithm back(i, j, k) will 
cost 0(rmax(n, m)). 

Finally we summarize our results in the following Theorem. 

Theorem 2 The Algorithm 5 solves the M-STR-EC-LCS problem correctly in 0(nmr) time and 0(nmr) 
space, with preprocessing time 0(r|S|). 
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Algorithm 6 back(i,j,k) 

Comments: A recursive back tracing algorithm to construct the answer LCS 
l: if i = or j = then 
2: return 
3: end if 
4: if Xi = yj then 

5; if f{i,j,k) = f(i - l,j - l,k) then 
6: back(i — l,j — 1, k) 
7: else 

8: back(i — l,j — 1, max a(i,j, k)) 
9: print Xi 
10: end if 

11: else if f(i — k) > f(i,j — 1, k) then 
12: back(i — k) 
13: else 

14: back(i,j — l,k) 
15: end if 



5 Concluding Remarks 

We have suggested a new dynamic programming solution for the M-STR-EC-LCS problem. The M-STR-IC- 
LCS problem is another interesting generalized constrained longest common subsequence (GC-LCS) which 
is very similar to the M-STR-EC-LCS problem. The M-STR-IC-LCS problem is to find an LCS of two main 
sequences, in which a set of constrain strings must be included as its substrings. It is not clear that whether 
the same technique of this paper can be applied to this problem to achieve an efficient algorithm. We will 
investigate the problem further. 
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